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Abstract —  Single-path  routing  provided  by  today's  widely  used  IGP’s 
such  as  RIP  make  extremely  inefficient  usage  of  network  bandwidth,  and  is 
evident  in  the  large  end-to-end  delays  flows  experience  in  single-path  rout¬ 
ing  as  compared  to  minimum-delay  routing.  Enhancement  to  OSPF  such  as 
optimized  multipath  have  not  proved  to  be  adequate  to  bridge  this  large  de¬ 
lay  gap.  Practical  implementation  of  minimum-delay  routing,  on  the  other 
hand,  have  been  largely  unsuccessful  for  reasons  such  as  scalability,  slow 
convergence  and  out-of-order  packet  delivery.  This  paper  proposes  a  traf¬ 
fic  engineering  solution  that  for  a  given  long-term  traffic  matrix  adapts  the 
minimum-delay  routing  to  the  backbone  networks  which  is  practical  and  is 
suitable  to  implement  in  a  Differential  Services  framework.  A  simple  scal¬ 
able  packet  forwarding  technique  is  introduced  that  distinguishes  between 
datagram  and  traffic  that  requires  in-order  delivery  and  forwards  them 
accordingly  and  efficiently.  Using  simulations  we  show  that  the  delays  ob¬ 
tained  are  comparable  to  minimum  delays  and  far  better  than  single-path 
routing. 


I.  Introduction 

The  shortest-path  routing  protocols  such  as  RIP[10], 
EIGRP[1]  and  OSPF[14]  that  are  widely  used  as  Interior  Gate¬ 
way  Protocols  (IGPs)  in  today’s  Internet  are  highly  inefficient 
with  respect  to  bandwidth  usage.  To  improve  bandwidth  utiliza¬ 
tion  and  reduce  delays,  several  improvement  to  the  basic  rout¬ 
ing  protocols  have  been  proposed!  15],  [19],  [21],  [6].  However, 
there  has  been  no  theoretical  basis  for  such  improvement  and 
have  remained  adhoc.  For  example,  in  ECMP[15]  load  is  dis¬ 
tributed  equally  over  multiple  equal-cost  paths  typically  using 
simple  round-robin  distribution  or  using  hashing  on  the  packet 
headers.  To  make  optimal  use  of  network  resources  and  min¬ 
imize  delays,  traffic  between  source-destination  pairs  may  of¬ 
ten  have  to  be  split  and  routed  along  multiple  paths  in  propor¬ 
tions  that  are  not  necessarily  equal  [9].  Though  OSPF-OMP[21] 
suggests  using  unequal  traffic  distritbution  on  multiple  paths, 
it  provides  no  concrete  method  to  find  a  distribution  that  con¬ 
forms  with  theoretically  optimal  distribution.  One  practical  im¬ 
plementation  of  optimal  routing,  namely  Codex[l  1],  uses  virtual 
circuits  to  setup  up  flows  between  source-destination  to  realize 
optimal  distribution,  but  the  architecture  introduces  unaccept¬ 
able  complexity  at  the  ingress  and  core  routers  and  is  not  scale- 
ble.  In  [22],  we  showed  how  minimum-delay  routing  can  be  ap¬ 
proximated  and  adapted  to  highly  dynamic  traffic  on  short  time- 
scales.  The  main  drawback  of  that  approach  was  that  packets 
may  arrive  out-of-order  at  destination  effecting  protocols  such 
as  TCR  This  paper  is  an  extension  of  that  work  and  addresses 
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the  out-of-order  packet  delivery  problem. 

Recently,  Traffic  Engineering  (TE),  has  received  tremendous 
attention  in  the  Internet  community  and  several  IETF  drafts  have 
appeared  [2],  [18],  [12],  Typically  in  a  TE  approach,  for  a 
given  optimization  criteria  and  input  traffic  matrix  specified  over 
medium-to-large  time  scales,  the  goal  is  to  determine  a  set  of 
flows  with  associated  paths  and  bandwidths  that  meet  the  op¬ 
timization  criteria.  TE  solutions  are  most  effective  in  a  net¬ 
work  under  a  single  administrative  domain  such  as  ISPs,  where 
knowledge  of  the  link  characteristics  and  input  traffic  matrix  can 
be  obtained.  TE  is  usually  employed  in  operational  networks  af¬ 
ter  topology  and  capacity  planning  have  taken  place.  Traffic  en¬ 
gineering  has  a  wide  scope  and  covers  diverse  forms  of  routing 
that  address  survivability,  QoS,  policy-basedrouting,  apart  from 
congestion  management  and  bandwidth  utilization.  Bandwidth 
utilization  and  reduction  of  delays  due  to  congestion,  however, 
are  of  pressing  importance  and  generally  more  difficult  to  ad¬ 
dress,  and  thus  is  the  focus  of  this  paper. 

Despite  the  initial  failure  of  direct  introduction  of  minimum- 
delay  routing  in  networks,  a  traffic  engineering  approach  to 
minimum-delay  routing  seems  to  have  great  potential  for  suc¬ 
cess.  We  explore  this  potential  in  this  paper.  To  the  best  of 
our  knowledge,  this  is  the  first  attempt  to  construct  a  TE  system 
based  on  minimum-delay  routing  formulated  in  [9]. 

There  are  several  algorithms  to  solve  the  minimum-delay 
routing  problem[3],  [17],  As  in  most  optimization  problems, 
a  solution  to  the  minimum-delay  routing  requires  splitting  of 
traffic  between  a  source-destination  pair  along  multiple  paths. 
Setting  up  explicit  paths  from  the  source  to  the  destination  us¬ 
ing  MPLS  leads  to  complexity  at  the  source.  On  the  other  hand, 
if  splitting  of  flows  is  not  allowed,  in  order  to  simplify  MPLS 
implementation,  the  solutions  obtained  are  sub-optimal[23]. 

To  obtain  an  accurate  implementation  of  the  solution  that 
is  practical  and  scalable  we  use  adopt  ideas  from  Differential 
Services  model[7]  and  OSPF-OMP[21],  The  solution  to  the 
minimum-delay  problem  is  obtained  using  off-line  algorithms  in 
the  form  of  routing  parameters  which  are  then  downloaded  into 
the  routers,  manually  or  using  automated  tools.  The  packet  for¬ 
warding  module  to  forward  packets  according  to  the  routing  pa¬ 
rameters.  At  the  ingress  router,  a  key  is  inserted  in  each  packet. 
In  the  intermediate  routers  packets  are  forwarded  based  on  the 
key  and  the  routing  parameters.  This  method  is  better  than  hash¬ 
ing  on  source-destination  pair[21]  as  it  distinguishes  between 
many  connections  between  a  source-destination  pair.  Also  dif- 
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ferential  treatment  is  given  to  datagram  and  traffic  that  requires 
in-order  delivery.  The  effect  is  that  a  more  accurate  distribution 
of  actual  traffic  according  to  routing  parameters  is  achieved  and 
consequently  the  end-to-end  delays  are  closer  to  the  optimal. 
The  architecture  is  practical  and  can  be  implemented  in  con- 
juction  with  other  per-hop  behaviors  the  Differential  Services 
architecture. 

The  paper  is  organized  as  follows.  Section  II  describes  the 
minimum-delay  routing  problem  and  the  challenges  involved 
in  implementing  it.  Section  III  describes  our  implementation 
strategy.  In  Section  IV,  through  simulations  we  evaluate  the  ef¬ 
fectiveness  of  the  TE  system  in  reducing  end-to-end  delays  and 
congestion.  Section  V  concludes  the  paper. 

II.  Minimum-delay  routing 

We  use  the  minimum  delay  routing  or  optimal  routing  formu¬ 
lated  in  Bertsekas  and  Gallager  [4]  as  the  basis  for  our  traffic 
engineering  technique.  The  formulation  is  reproduced  here  for 
convenience.  In  the  next  section  we  will  describe  our  implemen¬ 
tation. 

Let  a  computer  network  be  represented  as  a  graph  G  = 
(N,  L )  where  N  is  the  set  of  routers  and  L  is  the  set  of  links 
between  them.  Nl  is  the  neighbor  set  of  router  i.  For  i  ^  j, 
let  r'j  >  0  be  the  expected  input  traffic,  measured  in  bits  per 
second,  entering  the  network  at  router  i  and  destined  for  router 
j.  Let  Pj  be  the  set  of  all  directed  paths  connecting  i  and  j  in 
the  network,  and  let  W  =  |J  Pj.  For  p  G  W,  let  xp  >  0  be  the 
traffic  rate  on  path  p.  IF  /,/,  is  the  expected  traffic,  measured  in 
bits  per  second,  on  link  ( i ,  k ),  where  0  <  <  Cik  and  Cik  is 

the  capacity  of  link  ( i ,  k)  in  bits  per  second,  from  conservation 
of  traffic  we  have 


them  can  be  used  to  obtain  the  solution.  If  necessary,  a  fast  but 
suboptimal  method  described  in  [22]  can  also  be  used.  Once 
the  solution  is  obtained,  mechanisms  such  MPLS’s  LSP,  virtual- 
circuits  or  routing  parameters  are  setup  in  the  network  to  for¬ 
ward  traffic  along  paths  specified  by  the  solution.  Our  proposed 
implementation  is  based  on  routing  parameters  and  is  described 
in  the  next  section,  but  before  we  move  on  to  the  next  section  let 
us  see  how  the  above  solution  can  be  implemented  using  MPLS 
and  virtual-circuits  and  what  the  drawbacks  are. 

We  assume  the  TE  solution  is  implemented  in  a  single  con¬ 
tiguous  domain  where  the  traffic  matrix  and  the  link  character¬ 
istics  is  known  apriori.  The  minimum-delay  routing  algorithm 
is  first  applied  offline  on  the  given  traffic  matrix  (the  set  }) 
and  the  given  network  using  any  of  the  algorithms  in  the  refer¬ 
ences  mentioned  above  to  obtain  the  flow  set  { xp  >  0| p  G  W}. 
For  each  flow  xp ,  an  MPLS  label  and  the  associated  bandwidth 
is  obtained  and  a  corresponding  LSP  (label-switched  path)  is 
setup  in  the  network.  The  end  result  is,  at  each  source  node  a  set 
of  labels  with  corresponding  bandwidths  are  obtained  for  each 
destination.  Each  source  node  then  distributes  the  traffic  orig¬ 
inating  at  the  node  for  each  destination  according  to  the  band¬ 
width  and  assigns  labels  to  the  packets.  The  main  drawback  of 
the  above  technique  is  that  at  each  source  for  each  destination 
there  can  be  0(L)  flows.  Therefore  there  can  be  potentially  L 
flows  as  input  to  the  weighted  round-robin  (WRR)  procedure 
that  distributes  traffic  for  a  the  same  destination.  This  quadratic 
complexity  can  restrict  the  scalability  of  MPLS  method.  In  the 
interior  routers  too,  the  number  of  lables  increases  rapidly,  but 
this  can  be  controlled  to  certain  extent  if  multipoint-to-point  ag¬ 
gregation  of  LSPs  is  used.  A  virtual  circuit  implementation  of 
the  solution,  such  as  Codex[ll],  has  the  same  scalability  prob¬ 
lem. 
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Let  be  defined  as  the  expected  number  of  messages  or 

packets  per  second  transmitted  on  link  (i,  k)  times  the  expected 
delay  per  message  or  packet,  including  the  queuing  delays  at 
the  link.  We  assume  D, /;.(.)  depends  only  on  flow  /,*.  through 
link  ( i ,  k )  and  link  characteristics  such  as  propagation  delay  and 
link  capacity.  We  assume  Da  ifa  )  is  a  continuous  and  convex 
function  that  tends  to  infinity  as  /,/,  approaches  (7,/. .  This  is  the 
case  for  example  when  the  link  is  modeled  as  an  M/M/1  queue. 
The  total  expected  delay  per  message  times  the  total  expected 
number  of  message  arrivals  per  second  is  denoted  by  Dt  and 
defined  as 


°t  =  £  (3) 

{ i,k)EL 

The  minimum-delay  routing  problem  is  to  minimize  Dt- 
This  is  a  non-linear  programming  problem,  and  the  solution  is 
the  set  of  flows  :  {xp  >  0}.  There  are  many  algorithms  to  solve 
this  problem  [3],  [5],  [8],  [9],  [13],  [20],  [16],  [17],  and  any  of 


III.  Proposed  Implementation 

The  complexity  in  the  MPLS  and  virtual-circuit  implemen¬ 
tation  can  be  overcome  if  the  flows  are  setup  using  routing  pa¬ 
rameters.  The  routing  parameter  4>‘jk  specifies  the  fraction  of  the 
traffic  that  i  receives  for  destination  j  that  it  has  to  forward  on 
link  (i.  k ),  and  is  defined  as  follows[4]: 


<4 
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where  fu-  (j)  is  the  rate  of  traffic  destined  for  j  that  flows  on  link 

( i,k ). 

Neighbors  for  which  the  routing  parameters  are  greater  than 
zero  are  successors,  and  the  set  is  defined  as  Sj  =  {k\cpjk  >  0). 
Also  note  that  0jk  >  0  and  E&gam  tyk  =  T  The  next-hop  set 
Sj  represent  a  directed-acyclic  graph.  Observe  that  if  <j>'s  are 
restricted  to  0  or  1,  the  routing  problem  reduces  to  single -path 
routing. 

Once  the  routing  parameters  are  obtained,  they  are  down¬ 
loaded  into  the  routers.  The  packet  forwarding  mechanism  of 
the  routing  architecture  then  ensures  that  traffic  is  forwarded 
according  those  parameters.  The  WRR  handles  atmost  N‘  in¬ 
puts  for  each  destination  in  this  case,  which  is  far  more  scalable 
than  0{L).  This  task  is  non-trivial  for  two  reasons:  non-zero 


packet-size  and  the  requirement  of  in-order  delivery  of  packets. 
The  WRR  distributor  is  inadequate  to  handle  this  situation  and 
a  more  sophisticated  mechanism  is  needed.  In  the  rest  of  the 
section  we  describe  this  mechanism.  We  assume  that  there  are 
two  types  of  traffic:  one  tolerates  out-of-order  packet  delivery 
(e.g.,UDP)  and  the  other  does  not  (e.g.,TCP).  (Note  that  we  will 
denote  packets  that  do  not  require  in-order  delivery  using  UDP 
and  those  that  require  in-order  delivery  with  TCP)  Because  TCP 
traffic  must  be  delivered  in-order,  granularity  of  allocation  for 
TCP  traffic  is  at  the  flow  level,  but  for  UDP  traffic  it  is  at  packet 
level.  In  the  presence  of  only  UDP  traffic,  achieving  accurate 
traffic  distribution  according  routing  parameters  can  be  easily 
done  using  WRR.  To  delivery  TCP  traffic  in-order,  a  straightfor¬ 
ward  method  that  is  often  suggested  is  using  a  hashing  function 
on  the  packet’s  source-destination  address  to  determine  the  next- 
hop[21],  [19],  In  OSPF-OMP[21],  only  the  source-destination 
pair  is  used  in  the  CRC16  hashing  function  which  gives  only  a 
coarse  distribution.  By  using  TCP  port  numbers  in  addition  to 
source-destination  addresses  in  the  hash  function,  finer  distribu¬ 
tion  can  be  achieved.  But  this  can  be  expensive  to  implement 
because  unpacking  of  the  packets  is  required  at  each  hop.  Also 
in  OSPF-OMP,  both  UDP  and  TCP  traffic  are  handled  identi¬ 
cally.  We  show  that  by  treating  UDP  and  TCP  traffic  differ¬ 
ently  and  accounting  for  different  packet  sizes,  greater  fidelity 
between  routing  parameters  and  actual  traffic  distribution  can 
achieved.  In  our  approach,  the  hashing  is  decoupled  from  the 
source-destination  addresses,  and  hybrid  packet  forwarding  is 
used  to  improve  granularity  in  distribution. 

A.  Forwarding  Datagram  traffic 

Before  proceeding  to  describe  TCP  traffic  and  hybrid  packet¬ 
forwarding,  we  will  show  that,  if  only  UDP  traffic  is  present  and 
L  is  the  maximum  packet  size,  traffic  can  be  forwarded  such  that 
atmost  L  amount  of  traffic  more  than  the  traffic  that  would  be  al¬ 
lowed  by  the  routing  parameters.  This  limitation  imposed  by  the 
packet-size  is  fundamental  because  packet  transmission  is  non- 
preemptive.Fig.  1  below  describes  the  procedure  for  forwarding 
when  only  UDP  traffic  is  present. 


procedure  datagram-forwarding(P) 

{Executed  at  i  on  arrival  of  the  packet  P  for  j.} 

begin 

For  some  k  E  S],  let  W\  < 

3  3  K  —  3  K  3 

Forward  P  to  neighbor  k : 

Wjk  -s-  Wjk  +  sizeof(P): 

Wj  Wj  +  sizeof(P): 

end 


Fig.  1 .  Packet  forwarding  when  out-of-order  packet  delivery  is  acceptable. 

It  can  be  viewed  as  a  generalization  of  single-path  routing;  in 
single -path  routing  all  packets  are  forwarded  to  a  single  next- 
hop,  whereas  here,  packets  are  forwarded  to  each  of  the  sev¬ 
eral  next-hops  in  proportions  specified  by  the  routing  parame¬ 
ters.  This  procedure  is  extended  later  to  handle  hybrid  traffic.  In 
Fig.l,  Wj  is  the  total  traffic  that  node  i  receives  and  forwards 
for  j,  and  Wjk  is  the  portion  of  that  traffic  that  is  forwarded 
to  neighbor  k.  When  a  packet  for  destination  j  is  received,  the 


node  first  checks  which  neighbor  k  has  a  deficit  and  forwards 
it  to  that  neighbor,  after  which  Wj  and  Wjk  are  updated  ac¬ 
cordingly.  The  procedure  allows  at  most  L  bits  in  excess  than 
what  the  routing  parameters  allow.  This  is  the  finest  distribution 
one  can  obtain  without  breaking  packets  into  smaller  units.  The 
proof  is  provided  in  the  appendix. 

B.  Forwarding  TCP  traffic 

For  TCP  traffic,  because  the  granularity  is  at  flow  level,  such 
fine  distribution  of  traffic  as  in  UDP  is  very  difficult,  but  by 
making  some  reasonable  assumptions,  a  fairly  accurate  distri¬ 
bution  can  be  achieved.  We  assume  there  are  sufficiently  large 
number  of  flows  passing  through  a  router  and  the  number  of 
flows  is  several  order  of  magnitude  greater  than  the  number  of 
next-hop  choices.  The  duration  of  TCP  connections,  the  average 
rates  of  TCP  connections  and  the  packet  sizes  are  all  uniformly 
distributed  so  that  bandwidth  of  a  group  of  TCP  connections  is 
proportional  to  number  of  connections  in  the  group.  In  back¬ 
bone  networks  where  there  are  large  number  of  flows,  we  be¬ 
lieve  this  assumption  is  quite  reasonable.  When  the  number  of 
flows  is  low,  however,  the  distribution  is  relatively  imprecise. 
But  this  should  be  acceptable  because  when  network  load  is  low 
delays  due  to  congestion  not  a  serious  problem.  We  will  proceed 
with  these  assumptions,  and  present  a  procedure  for  TCP  packet 
forwarding.  In  section  IV,  we  will  give  some  experimental  re¬ 
sults  to  validate  some  of  these  assumptions.  It  should  be  noted 
that  under  heavy  load  condition  the  performance  of  the  routing 
scheme  in  the  worst  case  only  drops  to  that  of  single -path  rout¬ 
ing. 

As  mentioned  earlier,  we  use  an  architecture  similar  to  the 
Diffserv  model  and  obvious  advantage  is  it  can  be  incorporated 
along  with  other  differentiated  services.  At  the  ingress  router, 
for  each  TCP  connection  a  randomly  generated  key  is  associ¬ 
ated.  The  ingress  router  maintains  this  per-connection  table. 
When  a  packet  of  that  connection  arrives,  the  key  is  inserted 
into  the  packet.  Effectively  the  computation  of  hash  is  decou¬ 
pled  from  the  source  and  destination  addresses.  The  codepoint 
(TOS  or  DSFIELD  field)  of  the  packet  indicates,  among  other 
things,  that  the  packet  is  of  TCP-type.  Within  the  core  the  key 
is  used  to  hash  into  the  next-hop.  The  13-bit  fragment  offset 
is  used  to  hold  the  hash  key.  To  prevent  fragmentation  in  the 
domain,  the  DF  field  is  set.  This  is  possible  because  this  so¬ 
lution  is  applied  in  a  single  domain  and  it  is  feasible  to  know 
the  MTUs  of  all  the  links.  At  the  ingress  node,  use  of  the  13- 
bit  offset  field  can  thus  be  prevented.  The  per-hop  behavior  re¬ 
lated  to  queue  management  can  still  be  implemented  in  conjunc¬ 
tion  with  minimum-delay  routing.  By  decoupling  the  key  from 
the  source-destination,  better  randomness  in  the  key  distribution 
can  be  achieved  and  the  per-packet  processing  at  each  hop  can 
be  reduced.  We  must  mention  that  per-connection  table  can  be 
eliminated  and  CRC16  hash  be  used  on  each  entering  packet  in¬ 
stead.  But  remember  that  Diffserv  architecture  maintains  some 
per-connection  information  for  profiling  and  other  purposes. 

In  OSPF-OMP,  boundary  values  are  associated  with  each 
next-hop.  We  propose  a  different  method  which  uses  more 
memory,  but  is  faster.  With  each  destination  j  at  node  i,  a 
hash  table,  denoted  by  HTj,  is  associated.  The  table  is  a  sin¬ 
gle  column  table  with  Mj  entries.  With  each  entry  of  the  table 


a  next-hop  k  G  Sj  is  associated.  The  next-hops  are  distributed 
randomly  over  the  range,  and  the  fraction  of  entries  that  point 
to  a  particular  next-hop  will  be  proportional  to  the  routing  pa¬ 
rameters.  That  is,  for  each  k  G  5],  m*-  k  entries  of  fJTj,  chosen 
randomly,  are  filled  with  the  value  k ,  where  to*-  k  =  (f>j  kMj  and 
Mj  =  '}2keSi  to*-  A;.  In  other  words,  each  entry  in  the  hash  ta¬ 
ble  points  to  some  neighbor  in  the  successor  set  and  the  number 
of  entries  filled  with  a  successor  is  proportional  to  the  routing 
parameters.  Comparisons  with  boundary  values  in  OSPF-OMP 
has  0  ( N ' )  complexity. 

When  a  TCP  packet  for  j  arrives  at  i,  the  mod  function  on  the 
key  is  used  to  index  into  the  table  HTj  to  obtain  the  next-hop.  If 
Mj  is  chosen  such  that  it  is  a  power  of  two,  the  lower  logo  ( A/j ) 
bits  of  the  key  can  be  used  to  index  into  the  hash  table.  Because 
of  the  assumptions  made,  this  should  result  in  each  next-hop 
receiving  the  amount  of  traffic  in  accordance  with  the  routing 
parameters. 

C.  Hybrid  packet  forwarding 

Because  of  lack  of  complete  control  on  the  packet  forwarding 
of  TCP  traffic,  the  actual  traffic  forwarded  can  deviate  signif¬ 
icantly  from  the  amounts  specified  by  the  routing  parameters. 
The  skew  in  the  distribution  introduced  by  the  hashing  mech¬ 
anism,  can  be  ironed  out  to  some  extent  if  there  is  some  UDP 
traffic  also  present.  The  UDP  packets  then  can  be  forwarded  to 
neighbors  that  received  too  little  traffic  compared  to  what  the 
routing  parameters  allow.  OSPF-OMP  does  not  make  this  dis¬ 
tinction.  The  modified  packet  forwarding  procedure  is  as  shown 
in  the  Fig. (2). 


procedure  hybrid-forwarding(P) 

{Executed  at  i  on  arrival  of  packet  P  for  j. } 

begin 

if  (P  is  a  UDP  packet)  then 

For  some  k  £  S',  let  W‘,  <  <j>'..  W?; 

3  JK  JK  3 

endif 

if  (P  is  a  TCP  packet)  then 

Let  P’s  header  map  to  next-hop  k\ 

endif 

Forward  P  to  neighbor  k : 

Wjk  Wjk  +  sizeof(P)-, 

Wj  «-  U'j  +  sizeof(P): 

end 


Fig.  2.  Packet  forwarding  in  the  presence  of  both  UDP  and  TCP  packets. 

The  hybrid  procedure  is  similar  to  the  UDP-only  forwarding 
procedure  described  in  Fig.  1 ,  except  that  the  skew  in  distribution 
created  by  TCP  traffic  is  mitigated  by  UDP  traffic;  greater  the 
share  of  UDP  traffic  in  the  total  traffic,  finer  is  the  distribution 
of  the  traffic  according  to  the  routing  parameters.  In  the  diffserv 
model,  out-of-order  profile  packets  can  be  treated  as  datagram 
traffic.  In  section  IV,  we  will  give  some  performance  figures  re¬ 
garding  this.  The  amount  of  extra  space  required  in  the  routing 
table  is  of  the  order  of  0{NMN‘).  The  processing  time  re¬ 
quired  is  of  the  order  0(log(Nl).  No  CRC16  hash  at  each  hop 
is  used  and  no  comparison  with  boundary  values  is  done.  We  be¬ 
lieve  that  the  proposed  TE  architecture  can  be  easily  deployed  in 


existing  networks.  Though  hashing  is  not  new,  when  combined 
with  Diffserv  framework  and  hybrid  packet  forwarding  can  give 
significant  benefits. 

The  architecture  described  above  can  be  implemented  with 
current  IP  forwarding  technologies  with  small  modifications  and 
in  the  Diffserv  framework  without  needing  other  forwarding 
technologies  such  as  MPLS.  It  can  be  used  to  implement  other 
traffic  engineering  approach  using  other  optimization  criteria 
[23] .  The  only  requirement  is  that  the  solution  should  be  rep¬ 
resentable  using  routing  parameters. 

IV.  Experimental  Results 

We  tested  the  effectiveness  of  our  TE  scheme  through  a  se¬ 
ries  of  simulation  experiments.  In  each  of  the  experiments, 
for  the  same  given  input  traffic  matrix  and  network  configu¬ 
ration,  the  routing  parameters  are  first  computed  using  an  off¬ 
line  minimum-delay  routing  algorithm  and  downloaded  into  the 
router  tables.  After  that  packets  are  forwarded  according  to  the 
routing  parameters.  The  end-to-end  average  delays  are  mea¬ 
sured  for  each  flow.  Comparisons  are  made  between  delays  ob¬ 
tained  for  single-path  routing,  different  volumes  of  TCP  flows, 
and  different  proportions  of  TCP  and  UDP  traffic. 

The  network  used  in  the  simulation  is  shown  in  Fig. 3.  The 
network  is  a  contrived  network  with  node  degree  high  enough  to 
provide  several  next-hop  choices,  but  low  enough  so  that  there 
are  not  too  many  one-hop  paths.  The  links  have  bandwidth 
5MB,  propagation  delay  of  100  microseconds  and  packets  are 
of  size  1000  bytes. 


Fig.  3.  Topology  used  in  simulations 

Experiment  1:  The  delays  of  SP  are  compared  with  the  delays 
obtained  in  our  TE  scheme.  The  input  traffic  consists  of  500 
identical  TCP  flows;  50  of  them  originating  from  each  node. 
Fig.4(a)  makes  the  delay  versus  load  comparison.  The  x-axis 
denotes  the  load  and  the  y-axis  the  delay.  The  average  delays 
of  single-path  routing  are  denoted  by  SP  and  the  delays  of  our 
scheme  are  denoted  by  TE.  Observe  that  for  a  given  average  de¬ 
lay,  the  load  that  the  network  can  carry  is  much  greater  in  TE 
scheme.  At  very  low  loads  SP  performed  better  because  of  the 
tendency  of  TE  to  route  along  longer  than  shortest  paths,  un¬ 
der  low  loads.  Fig.  4(b)-4(d)  show  the  comparison  of  delays 
of  individual  flows  for  both  schemes  under  three  different  load 
conditions.  The  x-axis  in  this  case  denotes  flow  IDs  and  the  y- 
axis  the  average  packet  delays  for  the  flows.  Note  that  to  remove 
clutter  and  the  plots  clearer,  the  flows  are  first  sorted  in  ascend¬ 
ing  order  of  delays  of  single-path  routing  and  then  plotted.  As 
can  be  seen,  the  proposed  TE  scheme  significantly  outperforms 
the  single-path  routing  as  the  load  in  the  network  increases.  In 


Shortest-path  vs  TE 


Shortest-path  vs  TE 


Fig.  4.  Comparison  of  SPF  and  TE  scheme 


Fig.  4(b),  for  low  loads,  they  are  very  close  and  often  SP  is  bet¬ 
ter.  Under  high  loads  there  can  severe  congestion  in  single-path 
routing  which  can  be  avoided  in  multipath  routing  as  seen  in 
Fig.  4(d).  Because  of  the  use  of  multipaths  in  the  TE  scheme 
congestion  and,  therefore,  the  delays  are  reduced. 

Experiment  2:  We  test  our  assumption  that  when  large  num¬ 
ber  of  TCP  flows  are  present,  delays  close  to  minimum-delays 
can  be  obtained.  TCP-10  indicates  there  are  10  flows  between 
each  source-destination  pair.  Curve  UDP  represents  the  optimal 
routing.  Observe  that  in  Fig.  5  as  the  total  number  of  flows 
increases  from  90  to  900,  the  delays  decrease  to  levels  compa¬ 
rable  to  the  optimal.  This  is  because  of  the  finer  granularity  in 
distribution. 

Experiment  3:  Traffic  can  be  forwarded  more  accurately  ac¬ 
cording  to  routing  parameters,  if  the  presence  of  UDP  traffic 
and  packet  sizes  are  considered  during  packet  forwarding.  This 
experiment  tests  this  effect.  We  run  the  experiment  for  traffic 
that  is  split  60-40  between  TCP  and  UDP.  As  the  fraction  of 
UDP  traffic  increases,  the  average  delays  improve  (UDP-40).  As 
mentioned  earlier,  this  is  due  to  reducing  the  skew  introduced  by 
TCP  packet  forwarding  which  OSPF-OMP  does  not  do. 

V.  Conclusions 

We  proposed  a  traffic  engineering  (TE)  system  based  on 
minimum-delay  routing.  For  a  given  traffic  matrix,  multiple 
flows  between  each  source-destination  pair  are  determined  us¬ 


ing  an  offline  algorithm  and  are  then  established  using  routing 
parameters.  Packets  are  then  forwarded  according  to  routing 
parameters  using  hash  techniques  similar  to  OSPF-OMP.  Our 
approach  differs  from  OSPF-OMP  in  several  significant  ways. 
Firstly,  we  use  IETF’s  Differential  Services  model  to  implement 
the  TE  system.  The  hash  computation  for  a  packet  is  done  once 
at  the  ingress  node  and  inserted  into  the  packet.  Within  the  net¬ 
work  the  hash  key  is  directly  used  to  determine  the  next  hop 
without  further  hash  computation.  To  further  speed  up  forward¬ 
ing  the  next-hop  structure  uses  a  table  with  next-hop  entries  in¬ 
stead  of  boundary  values  as  in  OSPF-OMP.  Unlike  OSPF-OMP, 
our  system  treats  datagram  and  TCP  traffic  differently  and  takes 
into  account  the  packet  sizes  giving  finer  granularity  of  traffic 
distribution.  The  use  of  Diffserv  model  enables  complex  oper¬ 
ations,  such  as  peeking  into  the  TCP  header,  to  be  done  only 
once  at  the  ingress  node.  Also  finer  granularity  of  distribution 
can  be  achieved  by  decoupling  hash  computation  from  source- 
destination  address. 

Once  problem  that  we  have  not  addressed  in  this  paper  is 
adaptation  to  traffic  fluctuations  and  adjacency  link  failures.  We 
can  adapt  similar  techniques  in  OSPF-OMP,  but  this  problem 
is  inherently  difficult  because  our  goal  is  to  achieve  minimum- 
delay  routing  rather  than  mere  adjustment  to  congestion.  Also 
the  TE  is  described  as  applied  to  an  autonomous  system  and 
does  not  address  inter-domain  routing.  This  is  topic  for  further 
research. 


Fig.  5.  (a)Delays  as  a  function  of  volume  of  flows  (b)Delays  in  presence  of  hybrid  traffic 


Our  packet  forwarding  mechanism  is  quite  general  and  can 
be  used  for  other  optimization  criteria  as  in  [23].  The  system  is 
practical  and  can  be  implemented  in  conjuction  with  the  emerg¬ 
ing  Differential  Services  architecture.  It  offers  significant  ben¬ 
efits  in  terms  of  delay  and  throughput  performance  over  single 
path  routing. 
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Appendix 

A.  Properties  of  Datagram  Forwarding  Algorithm 

For  a  given  routing  parameters  we  have  to  show  that  the  for¬ 
warding  algorithm  (Fig.  1)  does  not  forward  more  than  L  bits 
to  any  of  the  next-hop,  and  hence  the  traffic  distribution  is  fairly 
accurate  for  UDP  traffic  even  on  a  small  scale.  And  the  maxi¬ 
mum  deficit  is  ( k  —  1  )L. 

For  simplicity,  we  slightly  modify  the  notation  as  all  distribu¬ 
tors  are  the  same.  Assume  a  stream  is  divided  into  k  streams  ac¬ 
cording  the  routing  parameters.  Let  W  be  the  amount  of  traffic 
that  arrived  so  far  and  Wk,  the  amount  of  traffic  that  is  forwarded 
to  stream  k  and  cpk  be  the  corresponding  routing  parameter.  Also 
Wp  denotes  the  value  of  W  when  the  pth  packet  arrives.  A(t,  r) 
is  the  amount  of  input  traffic  that  arrives  in  the  interval  [t ,  r]  and 
let  the  amount  of  traffic  that  stream  k  receives  in  the  same  inter¬ 
val  be  A(t,  r).  Note  that  at  time  t.  W  will  be  equl  A[0,  t}. 

Theorem  1:  In  the  algorithm  in  Fig.l  (a)  for  each  substream 
k,  (I<  -  1  )L  <  Ak(t,r )  <  L  +  <f>kp(t  -  t). 

Proof:  If  a  packet  could  not  be  scheduled  by  the  algorithm, 
then  VA:,  Wk  >  fkW  ■  This  implies  J2k  Wk  >  W  which  is  im¬ 
possible.  This  proves  that  every  packet  is  successfully  scheduled 
by  the  algorithm. 

We  first  show  Wk  <  L  +  4>kW .  Let  it  be  true  upto  processing 
of  p  —  1  packets.  Then,  for  all  i,  W^-1  <  L  +  4>kWp  !.  Let  the 
new  packet  p  be  assigned  to  queue  k.  we  have  W%  <  W^_1  +L. 
This  implies  Wf  <  L  +  okW’r  1  because  W^-1  <  <pkWp~1 . 
Substituting  Wp  =  Wp~l+L,  we  get  Wp  <  L+fkWp -Lfk. 
Therefore  W[!  <  L  +  <j>kWp .  Because  at  initialization  Wk  = 
W  =  0,  from  induction  it  follows  Wk  <  L  +  cpkW.  Because 
Wk  =  Ak(0,t)  and  W  =  .4(0,  t)  <  pt.  we  have  Ak(0,t)  < 
L  +  0kpt.  It  follows  that  Ak  (0,  t)  <  di  +  <pkpt  and  Ak  (0,  t)  < 
S-i  +  fkPT  for  Si,  S-2  G  [0,  L\.  Therefore,  Ak{t,r)  <  (Si-S-2)  + 
4>kp{t  —  t)  and  (<5i  —  8-f)  <  L.  That  (K  —  1  )L  <  Ak(t,.r ) 
directly  follows.  ■ 


